Projects

Blog Posts

Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg

Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.

Databricks Series, Part 6: ML Serving and Workflows

Batch and real-time model inference, Databricks Model Serving endpoints, and orchestrating the full ML pipeline with Databricks Workflows.

Databricks Series, Part 5: Machine Learning with MLflow

Tracking experiments, logging models and artifacts, comparing runs, and managing the model lifecycle with MLflow on Databricks.

Databricks Series, Part 4: Feature Engineering at Scale

Databricks Feature Store, FeatureEngineeringClient, FeatureLookup, training sets, and eliminating training-serving skew.

Databricks Series, Part 3: Data Ingestion with Auto Loader

cloudFiles format, schema inference, schema evolution, and building robust incremental ingestion pipelines on Databricks.

Databricks Series, Part 2: Lakehouse Architecture

Unity Catalog for governance and discovery, the medallion Bronze/Silver/Gold pattern, and Delta tables as the storage foundation.

Databricks Series, Part 1: Getting Started

Navigating the Databricks workspace, launching clusters, writing notebooks, and submitting your first PySpark job.

Databricks Series, Part 0: Overview

The lakehouse platform concept, what Databricks adds on top of Spark and Delta Lake, and how it compares to alternatives.

Delta Lake Series, Part 6: Streaming & CDC

Writing to Delta with Structured Streaming, exactly-once guarantees, reading Delta as a stream, and Change Data Feed for downstream propagation.

Delta Lake Series, Part 5: Performance Optimization

Making Delta Lake queries fast — OPTIMIZE, Z-ordering, data skipping with column statistics, compaction, and partitioning strategies.

Delta Lake Series, Part 4: Time Travel & Versioning

Querying historical snapshots by version or timestamp, rolling back bad writes, auditing the table history, and managing retention with VACUUM.

Delta Lake Series, Part 3: Schema Enforcement & Evolution

How Delta Lake validates schemas on write, rejects incompatible data, and handles controlled schema changes over time.

Delta Lake Series, Part 2: Transaction Log & ACID

How the Delta Lake transaction log enables atomicity, serializable isolation, optimistic concurrency, and conflict resolution.

Delta Lake Series, Part 1: Getting Started

Creating Delta tables, reading and writing with Spark, Delta SQL, and what the _delta_log looks like in practice.

Delta Lake Series, Part 0: Overview

The data lake reliability problem, what Delta Lake adds on top of Parquet, and how it compares to Apache Iceberg and Apache Hudi.